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Abstract. In this paper, we present SeaHorn, a software verification 
framework. The key distinguishing feature of SeaHorn is its modular 
design that separates the concerns of the syntax of the programming 
language, its operational semantics, and the verification semantics. Sea- 
Horn encompasses several novelties: it (a) encodes verification condi- 
tions using an efficient yet precise inter-procedural technique, (b) pro- 
vides flexibility in the verification semantics to allow different levels of 
precision, (c) leverages the state-of-the-art in software model checking 
and abstract interpretation for verification, and (d) uses Horn-clauses as 
an intermediate language to represent verification conditions which sim- 
plifies interfacing with multiple verification tools based on Horn-clauses. 
SeaHorn provides users with a powerful verification tool and researchers 
with an extensible and customizable framework for experimenting with 
new software verification techniques. The effectiveness and scalability 
of SeaHorn are demonstrated by an extensive experimental evaluation 
using benchmarks from SV-COMP 2015 and real avionics code. 


1 Introduction 

In this paper, we present SeaHorn, an LLVM-based framework for verifica- 
tion of safety properties of programs. SeaHorn is a fully automated verifier that 
verifies user-supplied assertions as well as a number of built-in safety properties. 
For example, SeaHorn provides built-in checks for buffer and signed integer 
overflows. More generally, SeaHorn is a framework that simplifies development 
and integration of new verification techniques. Its main features are: 

1. It decouples a programming language syntax and semantics from the underly- 
ing verification technique. Different programming languages include a diverse 
assortments of features, many of which are purely syntactic. Handling them 
fully is a major effort for new tool developers. We tackle this problem in 
SeaHorn by separating the language syntax, its operational semantics, and 
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the underlying verification semantics - the semantics used by the verification 
engine. Specifically, we use the LLVM front-end(s) to deal with the idiosyn- 
crasies of the syntax. We use LLVM intermediate representation (IR) , called 
the bitcode, to deal with the operational semantics, and apply a variety of 
transformations to simplify it further. In principle, since the bitcode has 
been formalized [^, this provides us with a well-defined formal semantics. 
Finally, we use Constrained Horn Clauses (CHC) to logically represent the 
verification condition (VC). 

2. It provides an efficient and precise analysis of programs with procedure us- 
ing new inter-procedural verification techniques. SeaHorn summarizes the 
input-output behavior of procedures efficiently without inlining. The expres- 
siveness of the summaries is not limited to linear arithmetic (as in our earlier 
tools) but extends to richer logics, including, for instance, arrays. Moreover, 
it includes a program transformation that lifts deep assertions closer to the 
main procedure. This increases context-sensitivity of intra-procedural anal- 
yses (used both in verification and compiler optimization) , and has a signif- 
icant impact on our inter-procedural verification algorithms. 

3. It allows developers to customize the verification semantics and offers users 

with verification semantics of various degrees of precision. SeaHorn is fully 
parametric in the (small-step operational) semantics used for the generation 
of VCs. The level of abstraction in the built-in semantics varies from consid- 
ering only LLVM numeric registers to considering the whole heap (modeled 
as a collection of non-overlapping arrays). In addition to generating VCs 
based on small-step semantics [^, it can also automatically lift small-step 
semantics to large-step (a.k.a. Large Block Encoding, or LBE). 

4. It uses Constrained Horn Clauses (CHC) as its intermediate verification 
language. CHC provide a convenient and elegant way to formally represent 
many encoding styles of verification conditions. The recent popularity of 
CHC as an intermediate language for verification engines makes it possible 
to interface SeaHorn with a variety of new and emerging tools. 

5. It builds on the state-of-the-art in Software Model Checking (SMC) and Ab- 
stract Interpretation (AI). SMC and AI have independently led over the 
years to the production of analysis tools that have a substantial impact on 
the development of real world software. Interestingly, the two exhibit com- 
plementary strengths and weaknesses (see e.g., |Tj [lO|[^[^ ) . While SMC so 
far has been proved stronger on software that is mostly control driven, AI is 
quite effective on data-dependent programs. SeaHorn combines SMT-based 
model checking techniques with program invariants supplied by an abstract 
interpretation-based tool. 

6. Finally, it is implemented on top of the open-source LLVM compiler infras- 
tructure. The latter is a well-maintained, well-documented, and continuously 
improving framework. It allows SeaHorn users to easily integrate program 
analyses, transformations, and other tools that targets LLVM. Moreover, 
since SeaHorn analyses LLVM IR, this allows to exploit a rapidly-growing 
frontier of LLVM front-ends, encompassing a diverse set of languages. Sea- 
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Fig. 1: Overview of SeaHorn architecture. 


Horn itself is released as open-source as well (source code can be downloaded 
from http://seahorn.github.io). 

The design of SeaHorn provides users, developers, and researchers with 
an extensible and customizable environment for experimenting with and imple- 
menting new software verification techniques. SeaHorn is implemented in C++ 
in the LLVM compiler infrastructure . The overall approach is illustrated in 
Figure [2 SeaHorn has been developed in a modular fashion; its architecture is 
layered in three parts: 

Fhont-End: Takes an LLVM based program (e.g., C) input program and gen- 
erates LLVM IR bitcode. Specifically, it performs the pre-processing and op- 
timization of the bitcode for verification purposes. More details are reported 
in Section 12] 

Middle-End: Takes as input the optimized LLVM bitcode and emits verifi- 
cation condition as Constrained Horn Clauses (CHC). The middle-end is in 
charge of selecting the encoding of the VCs and the degree of precision. More 
details are reported in Section]^ 

Back-End: Takes CHC as input and outputs the result of the analysis. In prin- 
ciple, any verification engine that digests CHC clauses could be used to 
discharge the VCs. Currently, SeaHorn employs several SMT-based model 
checking engines based on PDR/IC3 [^, including Spacer [3^ [3^ and 
GPDR [^. Complementary, SeaHorn uses the abstract interpretation- 
based analyzer IKOS (Inference Kernel for Open Static Analyzers) for 
providing numerical invariant^ More details are reported in Section 

The effectiveness and scalability of SeaHorn are demonstrated by our ex- 
tensive experimental evaluation in Section]^ and the results of SV-COMP 2015. 

Related work. Automated analysis of software is an active area of research. 
There is a large number of tools with different capabilities and trade-offs 

® While conceptually, IKOS should run on CHC, currently it uses its own custom IR. 
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[^ [r5|jl^[^[4^ . Our approach on separating the program semantics from the 
verification engine has been previously proposed in numerous tools. From those, 
the tool SMACK is the closest to SeaHorn. Like SeaHorn, SMACK tar- 
gets programs at the LLVM-IR level. However, SMACK targets Boogie inter- 
mediate verification language and Boogie-based verifiers to construct and 
discharge the proof obligations. SeaHorn differs from SMACK in several ways: 
(i) SeaHorn uses CHC as its intermediate verification language, which allows 
to target different solvers and verification techniques (ii) it tightly integrates 
and combines both state-of-the-art software model checking techniques and ab- 
stract interpretation and (iii) it provides an automatic inter-procedural analysis 
to reason modularly about programs with procedures. 

Inter-procedural and modular analysis is critical for scaling verification tools 
and has been addressed by many researchers (e.g., d|^[^|^|4^[5g). Our 


approach of using mixed-semantics 
been also explored in [^. While 


30 


as a source-to-source transformation has 
37], the mixed-semantics is done at the 


verification semantics (Boogie in this case), in SeaHorn it is done in the front- 
end level allowing mixed-semantics to interact with compiler optimizations. 

Constrained Horn clauses have been recently proposed |11| as an intermediate 
(or exchange) format for representing verification conditions. However, they have 
long been used in the context of static analysis of imperative and object-oriented 
languages (e.g., 41] 48]) a nd more recently adopted by an increasing number of 
solvers (e.g., |12|23||33|36|40|) as well as other verifiers such as UFO [^, HSF [^, 
VeriMAP m , Eldarica ■]5^, and TRACER (^. 


2 Pre-processing for Verification 

In our experience, performance of even the most advanced verification algo- 
rithms is significantly impacted by the front-end transformations. In SeaHorn, 
the front-end plays a very significant role in the overall architecture. SeaHorn 
provides two front-ends: a legacy front-end and an inter-procedural front-end. 

The legacy front-end. This front-end has been used by SeaHorn for the SV- 
COMP 2015 competition (for C programs). It was originally developed for 
UFO [^. First, the input C program is pre-processed with CIL to insert line 
markings for printing user-friendly counterexamples, define missing functions 
that are implicitly defined (e.g., malloc-like functions), and initialize all local 
variables. Moreover, it creates stubs for functions whose addresses can be taken 
and replaces function pointers to those functions with function pointers to the 
stubs. Second, the result is translated into LLVM-IR bitcode, using llvm-gcc. 
After that, it performs compiler optimizations and preprocessing to simplify the 
verification task. As a preprocessing step, we further initialize any uninitial- 
ized registers using non-deterministic functions. This is used to bridge the gap 
between the verification semantics (which assumes a non-deterministic assign- 
ment) and the compiler semantics, which tries to take advantage of the undefined 
behavior of uninitialized variables to perform code optimizations. We perform 
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a number of program transformations such as function inlining, conversion to 
static single assignment (SSA) form, dead code elimination, peephole optimiza- 
tions, CFG simplifications, etc. We also internalize all functions to enable global 
optimizations such as replacement of global aggregates with scalars. 

The legacy front-end has been very effective for solving SV-COMP (2013, 
2014, and 2015) problems. However, it has its own limitations: its design is not 
modular and it relies on multiple unsupported legacy tools (such as llvm-gcc 
and LLVM versions 2.6 and 2.9). Thus, it is difficult to maintain and extend. 

The inter-procedural front-end. In this new front-end, SeaHorn can take any 
input program that can be translated into LLVM bitcode. For example, Sea- 
Horn uses clang and gcc via DragonEgg]^ Our goal is to make SeaHorn not 
to be limited to C programs, but applicable (with various degrees of success) to 
a broader set of languages based on LLVM (e.g., C++, Objective C, and Swift). 

Once we have obtained LLVM bitcode, the front-end is split into two main 
sub-components. The first one is a pre-processor that performs optimizations 
and transformations similar to the ones performed by the legacy front-end. Such 
pre-processing is optional as its only mission is to optimize the LLVM bitcode 
to make the verification task ‘easier’. The second part is focused on a reduced 
set of transformations mostly required to produce correct results even if the 
pre-processor is disabled. It also performs SSA transformation and internalizes 
functions, but in addition, lowers switch instructions into if-then-elses, en- 
sures only one exit block per function, inlines global initializers into the main 
procedure, and identifies assert-like functions. 

Although this front-end can optionally inline functions similarly to the legacy 
front-end, its major feature is a transformation that can significantly help the 
verification engine to produce procedure summaries. 

One typical problem in proving safety of large programs is that assertions 
can be nested very deep inside the call graph. As a result, counterexamples are 
longer and it is harder to decide for the verification engine what is relevant 
for the property of interest. To mitigate this problem, the front-end provides a 
transformation based on the concept of mixed semantic^ |30[|37) . It relies on 
the simple observation that any call to a procedure P either fails inside the call 
and therefore P does not return, or returns successfully from the call. Based on 
this, any call to P can be instrumented as follows: 

— if P may fail, then make a copy of P’s body (in main) and jump to the copy. 

— if P may succeed, then make the call to P as usual. Since P is known not to 
fail each assertion in P can be safely replaced with an assume. 

Upon completion, only the main function has assertions and each procedure is 
inlined at most once. The explanation for the latter is that a function call is 

® DragonEgg (http://dragonegg.llvm.org/) is a GCC plugin that replaces GCC’s 
optimizers and code generators with those from LLVM. As result, the output can be 
LLVM bitcode. 

^ The term mixed semantics refers to a combination of small- with big-step operational 
semantics. 
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main () 

TrLdiiTlnew () 

Plentry • 

P^new () 


pi 0; pi 0; 

if (*) goto plentry-, 

if (*) goto p2entry-. 

p‘^new 1 

); 

assert (cl); 

else pine™ 0; 

elsep2„e™ 0; 

assume 

(c2); 

pi 0 

if (*) goto plentry-, 

if (~'c2) goto error-, 

p‘^new () 


P2 0; 

siss ‘P^riew (); 

Plentry • 

assume 

(c3); 

assert (c2); 

if (^cl) goto error-. 

if (^c3) goto error-, 



P2 0 

assume (false); 

assume (false); 



assert (c3); 


error : assert (false); 




Fig. 2: A program before and after mixed-semantics transformation. 

inlined only if it fails and hence, its call stack can be ignored. A key property of 
this transformation is that it preserves reachability and non-termination proper- 
ties (see |30| for details). Since this transformation is not very common in other 
verifiers, we illustrate its working on an example. 

Example 1 (Mixed-semantics transformation). On the left in Figure|^we show 
a small program consisting of a main procedure calling two other procedures 
pi and p2 with three assertions cl, c2, and c3. On the right, we show the new 
program after the mixed-semantics transformation. First, when main calls pi 
it is transformed into a non-deterministic choice between (a) jumping into the 
entry block of pi or (b) calling pi. The case (a) represents the situation when 
pi fails and it is done by inlining the body of pi (labeled by plentry) hrto main 
and adding a goto statement to plentry The case (b) considers the case when 
pi succeeds and hence it simply duplicates the function pi but replacing all the 
assertions with assumptions since no failure is possible. Note that while pi is 
called twice, it is inlined only once. Furthermore, each inlined function ends up 
with an “assume (false)” indicating that execution dies. Hence, any complete ex- 
ecution of a transformed program corresponds to a bad execution of the original 
one. Finally, an interesting side-effect of mixed-semantics is that it can provide 
some context-sensitivity to context-insensitive intra-procedural analyses. 


3 Flexible Semantics for Developers 

SeaHorn provides out-of-the-box verification semantics with different degrees 
of precision. Furthermore, to accommodate a variety of applications, SeaHorn 
is designed to be easily extended with a custom semantics as well. In this section, 
we illustrate the various dimensions of semantic flexibility present in SeaHorn. 

Encoding Verification Conditions. SeaHorn is parametric in the semantics used 
for VC encoding. It provides two different semantics encodings: (a) a small- 
step encoding (exemplified below in Figure and (b) a large-block encoding 
(LBE) j^. A user can choose the encoding depending on the particular applica- 
tion. In practice, LBE is often more efficient but small-step might be more useful 
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Fig. 3: (a) Program, (b) Control-Flow Graph, and (c) Verification Conditions. 


if a fine-grained proof or counterexample is needed. For example, SeaHorn used 
the LBE encoding in SV-COMP 


29 


Regardless of the encoding, SeaHorn uses CHC to encode the VCs. Given 
the sets J- of function symbols, V of predicate symbols, and V of variables, a 
Constrained Horn Clause (CHC) is a formula 

VV • {()Api[Xi] A • • • Apk[Xk] -A h[X]), for /c > 0 


where: is a constraint over X and V with respect to some background theory; 

Xi,X C V are (possibly empty) vectors of variables; Pi\Xi\ is an application 
p(ti, . . . Cn) of an n-ary predicate symbol p G V for first-order terms ti con- 
structed from T and Xp, and h[X] is either defined analogously to pi or is P-free 
(i.e., no V symbols occur in h). Here, h is called the head of the clause and 
4> Api\Xi\ A . . .Apk[Xk] is called the body. A clause is called a query if its head is 
P-free, and otherwise, it is called a rule. A rule with body true is called a fact. We 
say a clause is linear if its body contains at most one predicate symbol, otherwise, 
it is called non-linear. In this paper, we follow the Constraint Logic Programming 
(CLP) convention of representing Horn clauses as h[X] A- (j),pi[Xi\, . . . ,pk[Xk\. 

A set of CHCs is satisfiable if there exists an interpretation I of the predicate 
symbols V such that each constraint 4> is true under I. Without loss of generality, 
to check if a program A satisfies a safety property Usafe amounts to establishing 
the (un)satifiability of CHCs encoding the VCs of A, as described next. 


Example 2 (Small-step encoding of VCs using Horn clauses). Figure]^ a) shows 
a program which increments two variables x and y within a non-deterministic 
loop. After the loop is executed we would like to prove that x cannot be less 
than y. Ignoring wraparound situations, it is easy to see that the program is safe 
since x and y are initially non-negative numbers and x is greater than y. Since 
the loop increases x by a greater amount than y, at its exit x cannot be smaller 
than y. Figure |^b) depicts, its corresponding Control Flow Graph (CFG) and 
Figure [^c) shows its VCs encoded as a set of CHCs. 
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The set of CHCs in Figure |^c) essentially represents the small-step oper- 
ational semantics of the CFG. Each basic block is encoded as a Horn clause. 
A basic block label li in the CFG is translated into pj(Xi, . . . , X„) such that 
Pi G P and {Xi, . . . , Xn} C V is the set of live variables at the entry of block 
li. A Horn clause can model both the control flow and data of each block in 
a very succinct way. For instance, the fact (1) represents that the entry block 
Iq is reachable. Clause (2) describes that if Iq is reachable then li should be 
reachable too. Moreover, its body contains the constraints a; = 1 A y = 0 rep- 
resenting the initial state of the program. Clause (5) models the loop body by 
stating that the control flow moves to h from li after transforming the state of 
the program variables through the constraints x' = x -\- y and y' = y 1, where 
the primed versions represent the values of the variables after the execution of 
the arithmetic operations. Based on this encoding, the program in Figure]^ a) is 
safe if and only if the set of recursive clauses in Figure |^c) augmented with the 
query Perr is unsatisflable. Note that since we are only concerned about proving 
unsatisflability any safe final state can be represented by an infinite loop (e.g., 
clause (8)). 

SeaHorn middle-end offers a very simple interface for developers to implement 
an encoding of the verification semantics that fits their needs. At the core of 
the SeaHorn middle-end lies the concept of a symbolic store. A symbolic store 
simply maps program variables to symbolic values. The other fundamental con- 
cept is how different parts of a program are symbolically executed. The small-step 
verification semantics is provided by implementing a symbolic execution inter- 
face that symbolically executes LLVM instructions relative to the symbolic store. 
This interface is automatically lifted to large-step semantics as necessary. 

Modeling statements with different degrees of abstraction. The SeaHorn middle- 
end includes verification semantics with different levels of abstraction. Those are, 
from the coarsest to the finest: 

Registers only: only models LLVM numeric registers. In this case, the con- 
straints part of CHC is over the theory of Linear Integer Arithmetic (LIA). 
Registers + Pointers (without memory content): models 

numeric and pointer registers. This is sufficient to capture pointer arithmetic 
and determine whether a pointer is NULL. Memory addresses are also encoded 
as integers. Hence, the constraints remain over LIA. 

Registers -(- Pointers + Memory: models numeric and pointer registers and 
the heap. The heap is modeled by a collection of non-overlapping arrays. The 
constraints are over the combined theories of arrays and LIA. 

To model heap, SeaHorn uses a heap analysis called Data Structure Analysis 
(DSA) 1^. In general, DSA is a context-sensitive, field-sensitive heap analysis 
that builds an explicit model of the heap. However, in SeaHorn, we use a simpler 
context-insensitive variant that is similar to Steensgaard’s pointer analysis (^ . 

In DSA, the memory is partitioned into a heap, a stack, and global objects. 
The analysis builds for each function a DS graph where each node represents 
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Gout ~ = yold'i y ~ ^old)~ 
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massrt( 3 ^o/d 5 yold^ y^ Gout')^ Gout- 

fentry(s:, y) . 

fexit(x,y,res) <- 

fentry(:r, y), 

res = X + y. 

I{x, y, res) a- fexit(a:, y, res). 

dgntry (.X, y) . 

hex\t{x, y,res,eout) <- 

bentry {x, y) , 

res = X — y, 

Cout = (x>0Ay>0Ax< res). 
h{x, y, z, true, true). 

b(x, y, z, false, e^ut) bexit(a:, y, z, Cout) 


Fig. 4: A program with procedures (upper) and its verification condition (lower). 

a potentially infinite set of memory objects and distinct DBA nodes express 
disjoint sets of objects. Edges in the graph represents points-to relationships 
between DS nodes. Each node is typed and determines the number of fields and 
outgoing edges in a node. A node can have one outgoing edge per field but each 
field can have at most one outgoing edge. This restriction is key for scalability 
and it is preserved by merging nodes whenever it is violated. A DS graph contains 
also eall nodes representing the effect of function calls. 

Given a DS graph we can map each DS node to an array. Then each memory 
load (read) and store (write) in the LLVM bitcode can be associated with a 
particular DS node (i.e., array). For memory writes, SeaHorn creates a new 
array variable representing the new state of the array after the write operation. 


Inter-procedural proofs. For most real programs verifying a function separately 
from each possible caller (i.e., context-sensitivity) is necessary for scalability. The 
version of SeaHorn for SV-COMP 2015 achieved full context-sensitivity by 
inlining all program functions. Although in-lining is often an effective solution for 
small and medium-size programs it is well known that suffers from an exponential 
blow up in the size of the original program. Even more importantly inlining 
cannot produce inter-procedural proofs nor counterexamples which are often 
highly desired. 

We tackle this problem in SeaHorn, by providing an encoding that allows 
inter-procedural proofs. We illustrate this procedure via the example in Figure]^ 
The upper box shows a program with three procedures: main, foo, and bar. The 
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program swaps two numbers x and y. The procedure foo adds two numbers 
and bar subtracts them. At the exit of main we want to prove that the program 
indeed swaps the two inputs. To show all relevant aspects of the inter-procedural 
encoding we add a trivial assertion in bar that checks that whenever x and y are 
non-negative the input x is greater or equal than the return value. 

The lower box of Figure illustrates the corresponding verification condi- 
tions encoded as CHCs. The new encoding follows a small-step style as the 
intra-procedural encoding shown in Figure but with two major distinctions. 
First, notice that the CHCs are not linear anymore (e.g., the rule denoted by 
fTiassrt)- Each function call has been replaced with a summary rule (f and b) 
representing the effect of calling to the functions foo and bar, respectively. The 
second difference is how assertions are encoded. In the intra-procedural case, a 
program is unsafe if the query Perr is satisfiable, where Perr is the head of a CHC 
associated with a special basic block to which all can-fail blocks are redirected. 
However, with the presence of procedures assertions can be located deeply in 
the call graph of the program, and therefore, we need to modify the CHCs to 
propagate error to the main procedure. 

In our example, since a call to bar can fail we add two arguments and 
Coni to the predicate b where e™ indicates if there is an error before the function 
is called and Bout indicates whether the execution of bar produces an error. By 
doing this, we are able to propagate the error in clause massrt across the two calls 
to bar. We indicate that no error is possible at main before any function is called 
by unifying false with in the first occurrence of b. Within a can- fail procedure 
we skip the body and set Cout to true as soon as an assertion can be violated. 
Furthermore, if a function is called and is already true we can skip its body 
(e.g., first clause of b). Functions that cannot fail (e.g., foo) are unchanged. The 
above program is safe if and only if the query merr(true) is unsatisfiable. 

Finally, it is worth mentioning that this propagation of error can be, in theory, 
avoided if the mixed-semantics transformation described in Section is applied. 
However, this transformation assumes that all functions can be inlined in order 
to raise all assertions to the main procedure. However, recursive functions and 
functions that contain LLVM indirect branches (i.e., branches that can jump to 
a label within the current function specified by an address) are not currently in- 
lined in SeaHorn. For these reasons, our inter-procedural encoding must always 
consider the propagation of error across Horn clauses. 


4 Verification Engines 


In principle, SeaHorn can be used with any Horn clause-based verification tool. 
In the following, we describe two such tools developed recently by ourselves. 
Notably, the tools discussed below are based on the contrasting techniques of 
SMT-based model checking and Abstract Interpretation, showcasing the wide 
applicability of SeaHorn. 


The SeaHorn Verification Framework 


11 


4.1 SMT-Based Model Checking with Spacer 


Spacer is based on an efficient SMT-based algorithm for model checking pro- 
cedural programs j^. Compared to existing SMT-based algorithms (e.g., 

31,40 ), the key distinguishing characteristic of Spacer is its compositionality. 


That is, to check safety of an input program, the algorithm iteratively creates and 
checks local reachability queries for individual procedures (or the unknown pred- 
icates of the Horn-clauses). This is crucial to avoid the exponential growth in the 
size of SMT formulas present in approaches based on monolithic Bounded Model 
Checking (BMC). To avoid redundancy and enable reuse, we maintain two kinds 
of summaries for each procedure: may and must. A may (must) summary of a 
procedure is a formula over its input-output parameters that over-approximates 
(under-approximates) the set of all feasible pairs of pre- and post-states. 

However, the creation of new reachability queries and summaries involves ex- 
istentially quantifying auxiliary variables (e.g., local variables of a procedure). To 
avoid dependencies on such auxiliary variables, we use a technique called Model 
Based Projection (MBP) for lazily and efficiently eliminating existential quanti- 
fiers for the theories of Linear Real Arithmetic and Linear Integer Arithmetic. 
At a high level, given an existentially quantified formula 3x ■ ip{x,y), where 
ip is quantifier- free, it is expensive to obtain an equivalent quantifier-free for- 
mula ■ 0 ( 1 /) • Instead, MBP obtains a quantifier-free under-approximation r]{y) of 
3x-ip{x., y). To ensure that ry is a useful under-approximation, MBP uses a model- 
based approach such that given a model M ^ <p(x, y), it ensures that M ^ y(y). 

As mentioned in Section SeaHorn models memory operations using the 
extensional theory of arrays (ARR). To handle the resulting Horn clauses, we 
have recently developed an MBP procedure for ARR. First of all, given a quan- 
tified formula 3a • ip{a, y) where a is an array variable with index sort / and 
value sort V and ip is quantifier-free, one can obtain an equivalent formula 
3z,u • ip(i,v,y) where i and v are fresh variables of sort / and V, respectively. 
This can be achieved by a simple modification of the decision procedure for ARR 
by Stump et al. 53 and we skip the details in the interest of space|^ We illus- 
trate our MBP procedure below using an example, which is based on the above 
approach for eliminating existentially quantified array variables. 

Let ip denote (b = a[ii <— Ui]) V {a[i 2 ^ ’C 2 ][* 3 ] > 5Aa[i4] > 0), where a and b 
are array variables whose index and value sorts are both Int, the sort of integers, 
and all other variables have sort Int. Here, for an array a, we use a[i v] to 
denote a store of v into a at index i and use a[i] to denote the value of a at 
index i. Suppose that we want to existentially quantify the array variable a. Let 
M \= ip. We will consider two possibilities for M: 


1. Let M \= b = a[zi ui], i.e., M satisfies the array equality containing 
a. In this case, our MBP procedure substitutes the term h\ii x] for a 
in ip, where x is a fresh variable of sort Int. That is, the result of MBP is 
3x • ip^[ii -<r- x]/a]. 

* The authors thank Nikolaj Bjprner and Kenneth L. McMillan for helpful discussions. 
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2. Let M \= b ^ a[ii •<— v\]. We use the second disjunct of ip for MBP. Fur- 
thermore, let M \= i 2 ^ is- We then reduce the term a[i 2 ^ ^^ 2 ][*a] to 0 ( 13 ] 
to obtain a[z3] > 5 A a[z4] > 0, using the relevant disjunct of the select- 
after-store axiom of ARR. We then introduce fresh variables and Xi to 
denote the two select terms on a, obtaining CC 3 > 5 A CC 4 > 0. Finally, we add 
*3 = *4 A CC 3 = X 4 if M ^ i 3 = i 4 and add *3 ^ 14 otherwise, choosing the 
relevant case of Ackermann reduction, and existentially quantify X 3 and x^. 

The MBP procedure outlined above for ARR is implemented in Spacer. 
Additionally, the version of Spacer used in SeaHorn contains numerous en- 
hancements compared to |35|. 


4.2 Abstract Interpretation with IKOS 


Ikos is an open-source library of abstract domains with a state-of-the-art 
fixed-point algorithm [^. Available abstract domains include: intervals |19], re- 


duced product of intervals with congruences 1^, DBMs 1^, and octagons 44 


SeaHorn users can choose Ikos as the only back-end engine to discharge 
proof obligations. However, even if the abstract domain can express precisely 
the program semantics, due to the join and widening operations, we might lose 
some precision during the verification. As a consequence, iKOS alone might not 
be sufficient as a back-end engine. Instead, a more suitable job for iKOS is to 
supply program invariants to the other engines (e.g. Spacer). 

To exemplify this, let us come back to the example of Figure Spacer 
alone can discover x > y but it misses the vital invariant y > 0. Thus, it does 
not terminate. On the contrary, iKOS alone with the abstract domain of DBMs 
can prove safety immediately. Interestingly, Spacer populated with invariants 
supplied by Ikos using intervals proves safety even faster. 

Although we envision iKOS to be part of the back-end it is currently part 
of the middle-end translating bitcode to its own custom IR. Note that there is 
no technical impediment to move iKOS to the back-end. Abstract interpretation 
tools over Horn clauses have been previously explored successfully, e.g., 32 


5 Experimental Evaluation 

In this section, we describe the results of our evaluation on various C pro- 
gram benchmarks. First, we give an overview of SeaHorn performance at SV- 
COMP 2015 that used the legacy non-inter-procedural front-end. Second, we 
showcase the new inter-procedural verification flow on the hardest (for Sea- 
Horn) instances from the competition. Finally, we illustrate a case study of the 
use of SeaHorn built-in buffer overflow checks on autopilot control software. 

Results of SV-COMP 2015. For the competition, we used the legacy front- 
end described in Section The middle-end was configured with the large step 
semantics and the most precise level of small-step verification semantics (i.e., 
registers, pointers, and heap). Note, however, that for most benchmarks the heap 
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Fig. 5: Spacer vs. Z3-PDR on hard benchmarks (a) with and (b) without inlining 


is almost completely eliminated by the front-end. IKOS with interval abstract 
domain and Z3-PDR were used on the back-end. Detailed results can be found 
at http ; //tinyurl . com/svcompl5 

Overall, SeaHorn won one gold medal in the Simple category - benchmarks 
that depend mostly on control-flow structure and integer variables ~ two silver 
medals in the categories Device Drivers and Control Flow. The former is a set 
of benchmarks derived from the Linux device drivers and includes a variety of C 
features including pointers. The latter is a set of benchmarks dependent mostly 
on the control-flow structure and integer variables. In the device drivers cate- 
gory, SeaHorn was beaten only by BLAST (^ - a tool tuned to analyzing Linux 
device drivers. Specifically, BLAST got 88% of the maximum score while Sea- 
Horn got 85%. The Control Flow category, was won by CPAChecker u getting 
74% of the maximum score, while SeaHorn got 69%. However, SeaHorn is 
significantly more efficient than most other tools solving most benchmarks much 
faster. 

Results on Hard Benchmarks. SeaHorn participated in SV-COMP 2015 
with the legacy front-end and using Z3-PDR as the verification back-end. To 
test the efficiency of the new verification framework in SeaHorn, we ran several 
experiments on the 215 benchmarks that we either could not verify or took more 
than a minute to verify in SV-COMP. All experiments have been carried out 
on an Ubuntu machine with a 2.2 GHz AMD Opteron(TM) Processor 6174 and 
516GB RAM with resource limits of 30 minutes and 15GB for each verification 
task. In the scatter plots that follow, a diamond indicates a time-out, a star 
indicates a mem- out, and a box indicates an anomaly in the back-end tool. 

For our first experiment, we used inlining in the front-end and Figure 
shows a scatter plot comparing Z3-PDR and Spacer in the back-end. The plot 
clearly shows the advantages of the various techniques we developed in Spacer, 
and in particular, of Model Based Projection for efficiently and lazily eliminating 
existential quantifiers for integers and arrays. 

Figure compares the two back-end tools when SeaHorn is using inter- 
procedural encoding. As the plot shows, Z3-PDR runs out of time on most of 
the benchmarks whereas Spacer is able to verify many of them. 
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Spacer 

Spacer BMC 

Z3-PDR 

Total verified 

Safe 

21 

- 

3 

21 

Unsafe 

74 

76 

7 

81 


Table 1: Number of hard benchmarks that are verified as safe/unsafe by Spacer 
in its normal and BMC mode, and Z3-PDR, with inlining disabled. 


As mentioned in Section]^ inter-procedural encoding is advantageous from a 
usability point of view. It turns out that it also makes verification easier over-all. 
To see the advantage of inter-procedural encoding, we used the same tool Spacer 
in the back-end and compared the running times with and without inlining in 
the front-end. Figure]^ shows a scatter plot of the running times and we see that 
Spacer takes less time on many benchmarks when inlining is disabled. 

Spacer also has a compositional BMC mode (see Section 4.1 for details), 
where no additional computation is performed towards invariant generation after 
checking safety for a given value of the bound. This helps Spacer show the failure 
of safety in two additional hard benchmarks, as shown in Table The figure 
also shows the number of benchmarks verified by Z3-PDR, the back-end tool 
used in SV-COMP, for comparison. 


Case Study: Checking Buffer Overflow in Avionics Software. We have 
evaluated the SeaHorn built-in buffer overflow checks on two autopilot control 
software. To prove absence of buffer overflows, we only need to add in the front- 
end a new LLVM transformation pass that inserts the corresponding checks in 
the bitcode. The middle-end and back-end are unchanged. If SeaHorn proves 
the program is safe then it guarantees that the program is free of buffer overflows. 

Table shows the results of our evaluation comparing SeaHorn with an 
abstract interpretation-based static analyzer using IKOS (labelled analyzer) 
developed at NASA Ames . We have used two open-source autopilot control 
software mnav (160K LOG) and paparazzi (20K LOG). Both are versatile 
autopilot control software for a fixed-wing aircrafts and multi-copters. For each 
benchmark, we created two versions: one inlining all functions (inlined) and the 
other applying the mixed-semantics transformation (mixed). SeaHorn front- 
end instruments the programs with the buffer overflow and underflow checks. In 
the middle-end, we use large-step encoding and the inter-procedural encoding 
(for mixed). For mnav, we had to model the heap, while for paparazzi, modeling 
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Program 

#c 

ANALYZER 

SeaHorn 

%w 

T 

Tf 

Tm 

^Spacer 

Tfms 

^Spacer “h ^Ikos 

TpMSI 

mnav. inlined 

607 

4.7% 

36 

2 

18 

744 

764 

116 -t 52 

187 

mnav. mixed 

815 

8.2% 

10 

1 

8 

278 

287 

139 -L 5 

153 

paparazzi. inlined 

343 

0% 

85 

2 

1 

- 

3 

- 

3 

paparazzi. mixed 

684 

43% 

15 

1 

2 

3 

6 

2-tl 

6 


Table 2: A comparison between SeaHorn and ANALYZER on autopilot software. 

registers and pointers only was sufficient. For ANALYZER, we neither inline nor 
add the checks explicitly as these are handled internally. Both SeaHorn and 
ANALYZER used intervals as the abstract domain. 

In Table denotes the number of overflow and underflow checks. For 

ANALYZER, we show the warning rate %W and the total time of the analysis T. 
For SeaHorn, we show the time spent by the front-end (Tp) and the middle- 
end (Tm)- All times are in seconds. For the back-end, we record the time spent 
when Spacer alone is used (Tspager)) and the time spent when both Spacer 
and Ikos are used (Tspacer + Tikos)- The column TpMS and TpMSi denote the 
total time, from front-end to the back-end, when Spacer alone and Spacer 
together with iKOS are used, respectively. SeaHorn proves absence of buffer 
overflows for both benchmarks, while ANALYZER can only do it for paparazzi; 
although, for mnav the number of warnings was low (4%). To the best of our 
knowledge, this is the first time that absence of buffer overflows has been proven 
for mnav. For the inlined paparazzi benchmark, SeaHorn was able to discharge 
the proof obligations using front-end only (probably because all global array 
accesses were lowered to scalars and all loops are bound). The performance of 
SeaHorn on mnav reveals that the inter-procedural encoding significantly better 
than the inlined version. Furthermore, iKOS has a significant impact on the 
results. Specially, SeaHorn with iKOS dramatically helps when the benchmark 
is inlined. The best configuration is the inter-procedural encoding with iKOS. 


6 Conclusion 

We have presented SeaHorn, a new software verification framework with a 
modular design that separates the concerns of the syntax of the language, its 
operational semantics, and the verification semantics. Building a verifier from 
scratch is a very tedious and time-consuming task. We believe that SeaHorn 
is a versatile and highly customizable framework that can help significantly the 
process of building new tools by allowing researchers experimenting only on their 
particular techniques of interest. To demonstrate the practicality of this frame- 
work, we shown that SeaHorn is a very competitive verifier for proving safety 
properties both for academic benchmarks (SV-COMP) and large industrial soft- 
ware (autopilot code). 
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